Deep Q-Learning with Prioritized Sampling
نویسندگان
چکیده
The combination of modern reinforcement learning and deep learning approaches brings significant breakthroughs to a variety of domains requiring both rich perception of high-dimensional sensory inputs and policy selection. A recent significant breakthrough in using deep neural networks as function approximators, termed Deep Q-Networks (DQN), proves to be very powerful for solving problems approaching real-world complexities such as Atari 2600 games. To remove temporal correlation between the observed transitions, DQN uses a sampling mechanism called experience reply which simply replays transitions at random from the memory buffer. However, such a mechanism does not exploit the importance of transitions in the memory buffer. In this paper, we use prioritized sampling into DQN as an alternative. Our experimental results demonstrate that DQN with prioritized sampling achieves a better performance, in terms of both average score and learning rate on four Atari 2600 games.
منابع مشابه
ViZDoom: DRQN with Prioritized Experience Replay, Double-Q Learning, & Snapshot Ensembling
ViZDoom is a robust, first-person shooter reinforcement learning environment, characterized by a significant degree of latent state information. In this paper, double-Q learning and prioritized experience replay methods are tested under a certain ViZDoom combat scenario using a competitive deep recurrent Q-network (DRQN) architecture. In addition, an ensembling technique known as snapshot ensem...
متن کاملLearning from Demonstrations for Real World Reinforcement Learning
Deep reinforcement learning (RL) has achieved several high profile successes in difficult decision-making problems. However, these algorithms typically require a huge amount of data before they reach reasonable performance. In fact, their performance during learning can be extremely poor. This may be acceptable for a simulator, but it severely limits the applicability of deep RL to many real-wo...
متن کاملPrioritized Experience Replay
Experience replay lets online reinforcement learning agents remember and reuse experiences from the past. In prior work, experience transitions were uniformly sampled from a replay memory. However, this approach simply replays transitions at the same frequency that they were originally experienced, regardless of their significance. In this paper we develop a framework for prioritizing experienc...
متن کاملEfficient Exploration through Bayesian Deep Q-Networks
We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling based Reinforcement Learning (RL) Algorithm. Thompson sampling allows for targeted exploration in high dimensions through posterior sampling but is usually computationally expensive. We address this limitation by introducing uncertainty only at the output layer of the network through a Bayesian Linear Regression (BLR) mode...
متن کاملBayesian Deep Q-Learning via Continuous-Time Flows
Efficient exploration in reinforcement learning (RL) can be achieved by incorporating uncertainty into model predictions. Bayesian deep Q-learning provides a principle way for this by modeling Q-values as probability distributions. We propose an efficient algorithm for Bayesian deep Q-learning by posterior sampling actions in the Q-function via continuous-time flows (CTFs), achieving efficient ...
متن کامل